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ABSTRACT 

It is universal to see people obtain knowledge on micro-blog 
services by asking others decision making questions. In this 
paper, we study the Jury Selection Problem(JSP) by utiliz- 
ing crowdsourcing for decision making tasks on micro-blog 
services. Specifically, the problem is to enroll a subset of 
crowd under a limited budget, whose aggregated wisdom 
via Majority Voting scheme has the lowest probability of 
drawing a wrong answer(Jury Error Rate-JER). 

Due to various individual error-rates of the crowd, the 
calculation of JER is non-trivial. Firstly, we explicitly state 
that JER is the probability when the number of wrong ju- 
rors is larger than half of the size of a jury. To avoid the 
exponentially increasing calculation of JER, we propose t- 
wo efficient algorithms and an effective bounding technique. 
Furthermore, we study the Jury Selection Problem on t- 
wo crowdsourcing models, one is for altruistic users(AltrM) 
and the other is for incentive-requiring users (PayM) who 
require extra payment when enrolled into a task. For the 
AltrM model, we prove the monotonicity of JER on indi- 
vidual error rate and propose an efficient exact algorithm 
for JSP. For the PayM model, we prove the NP-hardness of 
JSP on PayM and propose an efficient greedy-based heuris- 
tic algorithm. Finally, we conduct a series of experiments to 
investigate the traits of JSP, and validate the efficiency and 
effectiveness of our proposed algorithms on both synthetic 
and real micro-blog data. 

I. INTRODUCTION 

Crowdsourcing, partially categorized as human compu- 
tation or social computation, is an emerging computation 
paradigm. It provides fundamental infrastructure to en- 
able online users to participate certain tasks as intellectual 
crowds. Amazingly, the wisdom of crowds outperforms com- 
puter programs at tasks involving creativity, human natural 
interpretation and subjective comparison, etc. In curren- 
t stage, typical crowdsourcing applications entail specially 
designed platforms, like Amazon MTurk, to enroll crowd 

Permission to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are 
not made or distributed for profit or commercial advantage and that copies 
bear this notice and the full citation on the first page. To copy otherwise, to 
republish, to post on servers or to redistribute to lists, requires prior specific 
permission and/or a fee. Articles from this volume were invited to present 
their results at The 38th International Conference on Very Large Data Bases, 
August 27th - 31st 2012, Istanbul, Turkey. 
Proceedings of the VLDB Endowment, Vol. 5, No. 1 1 
Copyright 2012 VLDB Endowment 2150-8097/12/07... $ 10.00. 



c 4^ ■+■ # 

<?(0.2) e(0.2) 
r($01) r.(Sp.2) 



e(0.3) eCO.3) 
r(S0.4) r(S0.6) 



{A,B,C, D,E} 

^C0.0704} 
r(Sl.6) 



a 

eCO.l) 
r{SO. »> 



{A.B.C} 

e(0.072) 
r(SO-6) 



eCO.40 
r($0.15) 



G ® ! 

r(SO.l) | 
I 



{A.F.G} 

eC°-208) 
r($0.5S) 



Figure 1: Is Turkey in Europe or in Asia? 

workers, control task flow and aggregate answers. On such 
platforms, crowd workers select tasks according to their own 
interests and reward requirement; on the other side, tasks 
requesters publish their tasks and wait for crowd workers 
accept and complete them in a random manner. 

Will the magic power of crowdsourcing be confined solely 
on specially designed platforms? Are crowd workers only ap- 
pearing because of monetary rewards? The answer are both 
negative. In this paper, we will introduce a long-existing 
pattern of crowdsourcing on the platform of a micro-blog 
network, that is, gathering answers for decision making ques- 
tions from micro-blog followers. 

Micro-blog services are popular social media, featuring 
excellent brevity to broadcast observation of events and ex- 
press users' opinions. This brevity is brought forward by 
the limited length of published content, e.g. 140 charac- 
ters for Twitter, and a brief markup culture like "RT" and 
which make it easy and even motivated for users to 
express their thoughts. The simplicity of use of micro-blog 
services encourages people to present their thoughts freely. 
Moreover, as mobile and web techniques advance, it becomes 
easier and easier for users to "tweet" via various ways. Be- 
sides its high accessibility, the huge population and diversity 
of users enable micro-blog services as a potential but pow- 
erful knowledge-base. 

For the reasons above, micro-blog service is born a plat- 
form qualified not only for spreading message, but also for 
crowdsourcing particular tasks, e.g. answering decision mak- 
ing questions. Each day, people find it more and more con- 
venient and reliable to seek answers from micro-blog users, 
for example, "Is it true that iPhone5 will come before Au- 
gust?" or "Is Doner Kebab available in Hong Kong?", etc. 
Such questions vary from minor problems as selection of 
dressing for a banquet to serious issues such as the predic- 
tion of macro markets trends. The magic point of such tasks 
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Table 1: Genome of Decision Making Tasks with 
Crowdsourcing [18] 



Who 


Why 


What 


How 


Anyone 


Altruism+Inccntivc 


Decision 


Group Decision 



on micro-blog networks is that question holders can active- 
ly choose their potential "workers" by simply mentioning 
them using the markup operator '@'. Later on "workers" 
return their product with the simple "Reply" button. Ta- 
ble 1 describes the genome of such tasks as crowdsourcing 
application. 

Another type of formal application of such decision mak- 
ing tasks is the discerning of rumorous messages [4] . Rumors 
are spread on micro-blog network, such as political astrotur- 
f and spam advertising [24], etc. And it is very difficult to 
discover and identify them by automatic algorithms. The 
main reason is that most of rumors look seemingly the same 
as truth, or expressed with plenty of rhetoric or sarcasm. 
To discern such rumors is thus a typical decision making 
problem for online users, which has long been practiced by 
micro-blog users. In practice, micro-blog users are utilized 
to monitor and identify earthquake information during the 
disasters in Japan and Chile [27] [4]. 

Following the same terminology in proposing judgement 
on court, which is one of the most typical decision making 
scenarios, in this paper we denote crowdsourcing "workers" 
as "jurors" , and the "crowds" as "jury" . Although "jurors" 
in a jury all wish to achieve the same goal(to answer the 
question correctly), each of them may make mistakes with a 
probability e^. Meanwhile, some of them will not participate 
in such a task unless a certain incentive n is offered. Not 
surprisingly, we hope to choose a most reliable and feasible 
subset of all possible "jurors" to vote on a question. Most 
reliable here means the possibility of giving wrong answer 
under majority rule is minimized. Formal definition of the 
problem will be given in Section 2, and a motivation example 
is given as follows. 

Motivation Example Suppose we are given a decision 
task, with a set of candidate "jurors" 5*(i.e. all the users 
in Figure 1), we have to decide whom we should ask for an 
answer. 

The first concern originates from the existence of defect, 
i.e. the probability of making mistakes among users in a 
jury. In Majority Voting(most classical Condorcet-like vot- 
ing [9]), if more than half of the users vote wrongly, the 
jury gives wrong answer. For the example in Figure 1, if 
we choose C, D, and E, with error-rates 0.2, 0.3, and 0.3 
respectively, as a crowd, the probability of getting a wrong 
answer from the entire crowd is 

0.2 -0.3- 0.3+ (1-0.2) -0.3- 0.3 + 2- 0.2- (1-0.3) -0.3 = 0.174 

This jury performs better than any individual of them does 
(e.g. 0.2 if only C is selected and 0.3 if select D or E is 
selected). Intuitively, we expect that the best jury comes 
from the best individuals. And indeed, with A, B, and C, 
the overall error-rate becomes 0.072 , which is smaller than 
with C, D and E. What if the size of jury expands with two 
more individuals? After taking D, E into the jury, the error- 
rate becomes 0.0704, which is even lower. Following such 
intuition, when we take two more individuals F and G, we 
find that the error-rate climbs to 0.085, which is worse than 
that of the previously smaller jury with size 5. Regarding 
such cases, we are interested in the problem of selecting 



Table 2: Error-rate of Example in Figure 1 



Crowd 


Individual Error-rate 


Jury Error-rate 


C 


0.2 


0.2 


A 


0.1 


0.1 


C,D,E 


0.2,0.2,0.3 


0.174 


A,B,C 


0.1,0.2,0.2 


0.072 


A,B,C,D,E 


0.1,0.2,0.2,0.3,0.3 


0.0703 


A,B,C,D,E,F,G 


0.1,0.2,0.2,0.3,0.3,0.4,0.4 


0.0805 


A,B,C,F,G 


0.1,0.2,0.2,0.4,0.4 


0.104 



members for a jury with the lowest error-rate. 

The second concern is about how to promote activity and 
productivity of the crowd. Financial rewards or other in- 
centives^. g. virtual commercial credits) are employed and 
proved to be effective. Incentive requirements vary among 
all the workers [13], so what if we cannot enroll the best 
jury due to a limited budget? In the example of Figure 1, 
because user D and E ask for rewards of $0.4 and $0.65 re- 
spectively, the sum of which already exceeds the $1 budget, 
the jury of A, B, C, D, and E cannot be formed. Should 
we give up D and E or should we take two cheaper but less 
reliable users F and G? The result in Table 1 shows that, in 
such settings, the smaller and cheaper jury with error-rate 
0.072 will perform better than the larger but more expensive 
one with error-rate 0.104. This dilemma reveals the second 
concern: how to select the best jury with a limited budget. 

In this paper, we propose a framework to form such crowd- 
sourcing function on micro-blog services, and particularly 
investigate the power of crowds to tackle decision making 
tasks with higher quality. In general, we have such contri- 
butions: 

1. We propose AltrM and PayM models to describe the 
behavior of crowd workers in crowdsourcing applications, 
and we formally propose the Jury Selection Problem(JSP) 
on both models; 

2. We explicitly state the complexity of calculating Ju- 
ry Error Rate and propose efficient algorithms with lower 
bounding criteria; 

3. To solve JSP, we proved the monotonicity of JER with- 
in a fixed size of jury, and propose an efficient algorithm to 
tackle JSP on the AltrM model; 

4. We prove that the Jury Selection Problem, under PayM 
Model, is NP-hard, and provide a polynomial heuristic al- 
gorithm to solve the problem. 

5. We propose a method to retrieve users' individual error 
rates via constructing message forwarding graph and rank- 
ing the users. 

2. MODELS AND PROBLEM DEFINITION 

In this section, we first introduce the concept of jury and 
the fundamental voting scheme for a decision making task, 
as well as two crowdsourcing models for selecting a jury. 
Then we investigate the effect of careless jurors and the Jury 
Error Rate drawn from them. At the end of this section, we 
formally define the Jury Selection Problem. 

2.1 Voting Scheme 

Each online user of micro-blog services can serve as a ju- 
ror, and the problem is how to select jurors and aggregate 
their distributed opinions, so that the wisdom of crowds is 
best utilized. 

The term jury is used to denote a set of jurors that can 
make decisions on court, and here we borrow the concept 
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for our decision making problem and redefine it formally as 
follows: 

Definition 1 (Jury). A jury J n = {31,32,- ■■ ,j n } C 
S is a set of jurors with size n that can form a voting. 

In most crowdsourcing applications, how to aggregate the 
wisdom of crowds is an important issue. The expected re- 
sult may be a predicted value, a rank of several items, labels 
or annotations of to different items, etc. There are two main 
schemes for aggregation of wisdom of crowds. On one hand, 
synthesis, like market prediction, is in natural numerical and 
can be aggregated arithmetically on the individual quanti- 
ties. On the other hand, for decision making tasks where 
there is "no natural way to 'average' the preference of in- 
dividuals" [9] , voting is a mainly adopted scheme. Voting 
is also considered as one of the most suitable aggregating 
mechanisms when intrinsic divergence exists among all in- 
dividuals, but a hard group decision must be made after 
synthesis. 

We thus give the definition of a voting as follows: 

Definition 2 (Voting). A voting V„ is a valid instance 
of a jury J n with size n, which is a set of binary values. 

2.1.1 Majority Voting 

A voting scheme defines how to aggregate a voting result 
so that a specific decision can be made. Specifically, we 
treat a voting scheme as a function defined on a voting (see 
Definition 2), and the output is a decision. 

Aggregating the opinions on a decision making task re- 
sembles the procedure of drawing a consensus from jury in 
the court. One most natural and clearest mechanism to 
make a single decision is Majority Voting, which takes the 
opinion that is supported by more than half of the jurors. 
In order to give a clear answer, we assume the size of a jury 
for Majority Voting is ODD. Formally, wc define Majority 
Voting as follows: 

Definition 3 (Majority Voting - MV). Given a vot- 
ing V n with size n, Majority Votmg is defined as 



MV(V n ) 



2.1.2 Error-rate 

The niche of collecting wisdom of crowds lies in the fact 
that, although intrinsic divergence may exists among all par- 
ticipants, their collaborative opinion is still reliable. Howev- 
er, uncertainty remains pervasive across individuals due to 
the lack of authoritative opinions and adequate background 
information. Also, from a jurisdiction perspective, where 
most jurors are expected to bear nearly the same judgemen- 
t in mind, uncertainty may rise from the objective difference 
of the accessibility to and processing of available informa- 
tion. Decision making for online users is such a case. Ac- 
cording to various history and backgrounds of individual, we 
assume that one individual has a single error-rate ti, where 
ti £ (0,1), indicating the probability that this particular 
participant will make a conflicting judgement to the latent 
true value. 

Definition 4 (Individual Error Rate - e»). The in- 
dividual error rate ti is the probability that a juror conducts 
a wrong voting. Specifically 

Ei — Pr(vote otherwise\a task with ground truth A) 



Ground truth A can be O(false) and l(true), which is un- 
known by the jury. 

While utilizing the wisdom of crowds, another issue has to 
be scrutinized further: how reliable these distributed judge- 
ments are? In a voting, a subset of Jury may vote wrongly 
due to the reasons we listed above, but there are an exponen- 
tial number of cases where different sets of jurors can make 
mistakes. So here we define the concept of Carelessness to 
capture such cases. 

Definition 5 (Carelessness - C). The Carelessness 
C is defined as the number of mistaken jurors in a jury J n 
during a voting, where < C < n. 

Informally, we define a possible combination of mistaken ju- 
rors in a Jury as Minority, and straightforwardly we can 
find that there are an exponential number of such possible 
Minorities for a particular C. To exactly measure the re- 
liability of a crowd, we define the Jury Error Rate as the 
probability that a Voting v misses the true value because 
more than half of the jurors are wrong: 

Definition 6 (Jury Error Rate - JER(J n )). The 
jury error rate is the probability that the Carelessness C is 
greater than for a jury J n , namely 

n 

jER(j n )= Yi En^nt 1 - 6 '-) 

where Fk is all the subsets of S with size k and e» is the 
individual error rate of juror ji. 

As shown in the definition, a naive method to calculate 
JER is to enumerate all Minorities and accumulate their 
probabilities. This method entails an exponential number 
of product terms and renders any possible algorithm ineffi- 
cient. We present two efficient algorithms to accelerate the 
computation in Section 3.1 

Note that as a decision making problem, we assume that 
for each discerning task, there exists an objective and true 
judgement which is latent for all the participants before the 
crowd's decision is aggregated. 

2.2 Crowdsourcing Models 

In most crowdsourcing and human computing application- 
s, how to motivate users to participate is an interesting prob- 
lem. In this section, we present two models to describe the 
most prevailing phenomenon on crowdsourcing services. 

2.2. 1 Altruism Jurors Model 

As cited in [12], people who spend a huge bunch of time 
online are not a uniform sample from the real world; on the 
contrary, white, educated people with middle and higher in- 
come are the main part of the online community. Among 
them, there exist plenty of altruistic workers who are mo- 
tivated to participate in a task simply because they are in- 
terested or they feel they are obligated to participate. In 
this case, no matter how talented the worker is, he or she 
requires no extra payment as incentive, which means that 
any set of such users can form and function as a jury. 

Definition 7 (Altruism Jurors Model - AltrM). 
While selecting a jury J from all candidate jurors ( choosing 
a subset J C S), any possible jury is allowed. 
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Table 3: Summary of Notations 



Notation 


Description 


Ji 


a juror (with index i) 




individual error rate of jury i 


Jn 


a (candidate) jury with size n 


C 


the number of wrong jurors 
in a jury J„ during a voting 


JER(J n ) 


the probability that the jury 
J n fails under a voting 


AltrM 


Altruistic Jurors Model 


PayM 


Pay-as-you-go Jurors Model 


n 


the size of a formed Jury J„ 


N 


the size of all candidate jurors 



Note that, by terming "allowed", we mean that this partic- 
ular selection is legal and can be used to conduct a decision 
making task. Next, we are going to investigate the case that 
jurors ask for extra incentive, which may lead to a case that 
a jury is not "allowed" . 

2.2.2 Pay-as-you-go Model 

The same thing happens on the U.S. court, that quite a 
portion of citizens do not feel glorious when selected to at- 
tend a trial or a hearing as a juror due to the loss of time and 
income, or simply because they feel intrinsically reluctant on 
particular issues. Meanwhile, most prevailing general pur- 
pose crowdsourcing platforms, like Amazon Turk, promise 
to pay the workers after they finish the task. With the 
monetary incentive, the jury selection procedure encounters 
more complicated problems. Financial incentive may also 
incur anchoring effect, which makes the aggregation of dis- 
tributed knowledge even more sophisticated. However, at 
this stage, we only focus on the effect when a juror, or a 
set of them, are too expensive to be selected into a jury. 
Formally, we define the following model: 

Definition 8 (Pay-as-you-go Model - PayM). 

While selecting a jury J from all candidate jurors ( choos- 
ing a subset J C S), each candidate juror ji is associated 
with a payment requirement ri where ri > 0, the possible 
jury J is allowed when the total payment of J ts no more 
than a given budget B, namely X^vj ei ri — B- 

2.3 Problem Definition 

Here we formally define the Jury Selection Problem as an 
optimization problem. 

Definition 9 (Jury Selection Problem - JSP). 

Given a candidate juror set S with size \S\ = N, a budget 
B > 0, a crowdsourcing modelf AltrM or PayM), the Jury 
Selection Problem(JSP) is to select a jury J„ C S with size 
1 < n < N, that J n is allowed according to crowdsourcing 
model and JER(J n ) is minimized. 

For brevity, please refer to Table 3 as a collection of all 
the notations used in this paper. 

3. JURY SELECTION ALGORITHM 

Due to the existence of the variety of jurors' individual 
error-rates, it is non-trivial to form a best jury in terms of 
JER. An intuitive thinking might be that the best jury is 
selected from the best jurors, which means we can sort all the 



individuals with respect to their error-rates before selection. 
But how should we decide the size of a jury? As presented in 
the example of Section 1, the 5-juror group performs better 
than the 3-juror one, but when the size increases, a 7-juror 
group does not show any superiority over the smaller jury. 
Moreover, as we discussed in Section 2.1.2, even with a given 
set of candidate jurors, the calculation of JER is not trivial, 
let alone the optimal selection problem. 

In this section, we will formally investigate calculation of 
JER, and then discuss JSP with AltrM model, along with 
two efficient algorithms. Then, we present a solution for JSP 
under PayM model and discuss its complexity. 

3.1 Calculation of JER 

To select the best jury with the minimum JER, we first 
have to calculate JER for a given jury. Theoretically, the 
number of jurors who give wrong votes on a task(the C in 
Definition 5) is a random variable which follows the Poisson- 
Binomial distribution [21]. A naive method(used in the mo- 
tivation example) to calculate this value is to enumerate all 
the Minorities and calculate the overall error-rate for each 
of them. Obviously this method is very inefficient and even 
impractical when the number of candidate jurors becomes 
large. Fortunately, we can speed up this calculation with 
dynamic programming. 

3.1.1 A Dynamic Programming Method 

To simplify the illustration of calculating JER, we here 
assign an ordering {ji, J2, ■ • ■ , jn} for the n jurors(not nec- 
essarily sorted), and refer J m to the set of ••• ,jm}- 
The basic observation is that there are repeated calcula- 
tions of JER from a smaller jury to a larger one. Given a 
jury J n with size n, if j n makes a wrong vote(actually it 
can represent an arbitrary juror), the target JER(J n ) be- 
comes the probability that — 1 jurors vote incorrectly 
in the jury J n \{jn} = Jn-i- Straightforward enough, when 
this juror makes a correct decision, JER(J n ) becomes the 
probability that still at least jurors are wrong, but in 
the smaller jury J n -i excluding j n . Formally we have the 
following lemma: 

Lemma 1. The calculation of JER of Jury with size n can 
be split into smaller ones: 

Pr(C > L\J n ) 

= Pr(C > L - l\J„-i) ■ e n + Pr(C* > L\J n -i) ■ (1 - e„) 
where 

Pr(C* > 0\J m ) = 1 V < m < n 
Pr(C* > m\Jn) = V m>n 

Proof. Straightforward from Definition 6 □ 

The initial conditions have a clear meaning: Pr(C > 0| J m ) = 
1 covers all situations given a jury, and Pr(C > m\ J„) = 
means the number of wrong jurors cannot exceed the size of 
a given jury. Then, based on Lemma 1, we can propose the 
following method to calculate JER(J n ). 

We present a bottom-up implementation in Algorithm 1 
by maintaining a two-dimension array E[i,j] in Line 2: S- 
tarting from Pr(C > l|Ji) = Pr(C > 0|J ) • ei + Pr(C > 
l|Jo) • (1 — ei), we can iteratively compute JER with an 
increasing size of jury. Specifically, Pr(C > lj J m ) can be 
calculated from e m and Pr(C > 1| J m -i) because all Pr(C > 
l|J m -i) is by Lemma 1. After calculating from Pr(C > 
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l|Ji) to Pr(C > 1|J n+i ), we can further calculate all 

Pr(C* > 2jJ m ) from Pr(C> l|J m _i) and Pr(C > 2|J m _i) 
in the same manner. In this way, we can finally obtain the 
value of JER(J„) after £±i rounds. 



Algorithm 1 DP-based Algorithm 
Input: A jury J„ 

Output: The Jury Error Rate JER(J n ) 
1: m<- + 2)/2j ; 

2: create an array Z?[0, . . . , m][0, . . . , n] with all value as 0; 

3: for i — ^— n do 

4: for j = 1 «- + i do 

5: if t == then 

6: 

7: else 

8: E[t[ [j] «- E[i - 1] [j - 1] * ej + E\i] [j - 1] * (1 - e,); 

9: end if 
10: end for 
11: end for 

12: return JER(J n ) <- £[n][2±i] ; 



Note that in each iteration(a fixed size of I), we only need 
to calculate (n — + I = ^Ji) times of JER because 
Pr(C > Z|J m ) with larger I is not necessary. Then we have 
following analysis. 

Corollary 1. The calculation of JER(J n ) entails at 
most 0(n 2 ) time and at most 0(n) space using Dynamic 
Programming. 

Proof. There are in total 2il rounds of iteration, and 
within each iteration, there are at most (n — = 
times of simple calculation as to Lemma 1. Each simple 
calculation entails O(l) time cost, and thus the calculation 
of JER(J n ) needs 0(2±1 • = 0(n 2 ) time. 

At any point, to calculate a particular Pr(C > l\ J m ), only 
two vectors of previous calculated value are needed, i.e. one 
vector for the JER with same I and one vector for the ones 
with I — 1. Hence, the space cost is 0(2 • n) — 0{n). □ 

3.1.2 A Divide and Conquer Method 

Since the time complexity of the dynamic programming- 
based algorithm is 0(n 2 ), we have to spend much time in 
calculation when the size of jury is quite large, therefore we 
need to improve the efficiency of calculating JER further. 

In this subsection, we propose another more efficient algo- 
rithm CBA(Convolution-bascd Algorithm), which is based 
on the divide &: conquer framework instead of the dynamic 
programming strategy. In order to compute JER, it is e- 
quivalent to obtain the probability distribution of C, which 
is the number of jurors who give wrong votes on a task. 
The main idea of this algorithm is stated as follows: the 
algorithm first considers the probability distribution of C as 
coefficients of a polynomial. Then, it divides the jury in- 
to two parts and recursively calls this process. When the 
jury has only one juror, the probability distribution of one 
juror is considered as the coefficients of a one-order polyno- 
mial. After partition, it uses the polynomial multiplication 
to merge the probability distributions of juries with smaller 
sizes and finally obtains the complete probability distribu- 
tion of C. The process of divide & conquer will spend 0(n ) 
time. However, we can use Fast Fourier Transform (FFT) 
method to speed up the process of polynomial multiplica- 
tion. Thus, the final time complexity of CBA algorithm is 



O(nlogn). The pseudo-code of CBA is shown in Algorith- 
m 2. 

In Algorithm 2, we first address the special case that there 
is only one candidate in J n in Lines 2-4. Then from Line 6 
to 8, the algorithm divides the computation of Dc into t- 
wo parts, and in Line 9, the convolution-based merging is 
conducted via FFT. Note that the returned value in Line 11 
is Dc, the distribution of C, in order to support recursive 
calling. JER(Jn) can be easily retrieved as Y*™_ n +i Dc[i]- 



Algorithm 2 Convolution-based Algorithm(CBA) 
Input: A jury J n 
Output: The JER(J n ) 

1: if n = 1 then 

2: £> o [0] = l-ei; 

3: £>o[l]=ei; 

4: return Dc; 

5: else 

6: Dividing J n into two parts: J n i and J„2, where 

|J„i| = LfJ and |J„ 2 | = \%]- 
7: Dei = CBA(J nl ); 
8: D C 2 = CBA(J n2 ); 

9: Dc =convolution of Dei and Dc2 via FFT; 
10: end if 
11: return Dc; 



3.1.3 Lower Bound-based Pruning 

Both the dynamic programming-based and convolution- 
based algorithms want compute JER efficiently. However, 
computing JER for each J n is redundant because there is 
only one jury which is finally selected. Thus, it is important 
to filter out insignificant candidate juries as early as possible. 
A natural idea is to quickly find a tight lower bound of JER 
to determine whether a new JER needs be computed. Based 
on the Paley-Zygmund inequality [26], we can obtain a tight 
lower bound of JER as follows. 

Lemma 2 (Lower Bound-based Pruning). Given a 
jury with size n, the lower bound of JER(J n ) is shown as 
follows, 

JER ^ * (^vl- 

where li = Y,? =l ti,<r 2 = £?=i(l and 7 = (=±i//i) G 
(0,1). 

Proof. According to the definition of JER, JER is the 
following probability:Pr{C > -^f-}, where C is the num- 
ber of jurors who give wrong votes on a task. Since C is 
a random variable following Poisson Binomial distribution, 
the expectation and variance of C are /i = X/ILi e * an d 
°" 2 = 5^r=i( 1 _ e «) e « respectively. 

Based on the Paley-Zygmund inequality, we can know: for 
a positive random variable C, 

MC>,E(C )} > { ^-^ a2 

where E(C) means the expectation of the random variable 
C. Hence, let 7 = / fj,,we can rewrite the formula of JER 
as follows: 

JER(J n ) = Pr(C > 2±i) = Pr{C7 > 7 ■ a} > - (1 ~ f/ 

2 (1 — 7)^/^ + a* 

□ 
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According to Lemma 2, we can observe that the time com- 
plexity of computing the lower bound of JER is only 0(n), 
where n is the size of the jury. Thus, the time cost of lower 
bound calculation is smaller than that of both algorithms. 
Therefore, the lower bound-based pruning should improve 
efficiency of the algorithms computing JER. 

3.2 JSPonAltrM 

3.2. 1 Monotonicity with Given Jury Size 

Before we reach the final algorithm for selecting an op- 
timal jury on AltrM, we firstly investigate whether JER 
follows monotonicity on individual error-rate with a given 
jury size. 

Lemma 3. The lowest JER originates from the Jurors 
with lowest individual error-rate among the candidate jurors 
set S. 

PROOF. W.l.o.g, we pick one ji of the n jurors in a given 
Jury J n with size n. Then JER(J n ) can be transformed as 
below: 

n 4- 1 

JER(J n ) = Pr(C> —^~\Jn) 
= ei (Pr(C > ILti - l|J„_i)) 

+ (l- £i ).(Pr(C>^±V„_i)) 
=£i(Pr(C > !L±i - l|J„_i) - Pr(C > 2±±\ J n _i)) 

+ Pr(C> !i±l|j n _!) 

n 4- 1 n 4- 1 

= ei (Pr(C = - 1| J„_i) + (Pr(C > ^-| J n _i)) 

=£, - i+ B 

It is obvious that A = Pr(C = 2±I - l|J„-i) > 0, so that 
the Ji?i? is a monotone increasing function with respect to 
individual error rate e^. In this way, given a jury with size n 
and the candidate jurors set S with size N, we finally prove 
Lemma 3 by contradiction: 

Suppose a Jury J' n has the lowest JER, and juror j[ is 
one of the members but with a rank higher than n in the 
candidate juror set S in an ascending order with respect to 
the individual error rate e. Because J' n consists of n jurors, 
there must be a juror ji which is not in J' n whose individual 
error rate ti is lower than that of j[. By substituting ji with 
ji into J^, according to the monotone increasing property 
above, J' n will have a lower JER, which contradicts with its 
previous assertion as optimal. □ 

3. 2. 2 Algorithm for AltrM 

Based on Lemma 1 and Lemma 3, we can now propose 
an efficient algorithm to solve JSP on AltrM model: firstly, 
the algorithm sorts all jurors in the candidate juror set S in 
an ascending order of e; then varying the possible size n of 
jury from 1 to N , we calculate JERv{J n ); finally we return 
the jury with the lowest JER as solution. 

In the Line 5, the algorithm first checks the condition 
whether 7 is less than 1 according to Lemma 2. If 7 satisfies 
the condition, the algorithm then runs a lower bounding 
test in Line 6 as early-termination condition for JER. If 7 
is larger than 1, the algorithm will calculate JER directly. 



Algorithm 3 Framework of JSP on AltrM - (AltrALG) 
Input: A subset of candidate juror set s C S 
Output: A subset of candidate juror set S with lowest 
JER( Definition 6); 

1: s := ji,j c = ji] I jjc is the largest juror in current set 

2: sort e*in ascending order into e sor ted', 

3: for n = 1 : N with step of 2 do 

4: form candidate Jury J n by selecting the first n jurors 

nl € S orted', 

5: if 7(s U {jctoj n }) < 1 then 

6: if JERi 

owerbound 

(s U {j c toj n }) < JER(s) then 

7: calculate JER(s U {j c toj n }); 

8: update s accordingly; 

9: end if 

10: else 

11: calculate JER(s U {jctoj„}); 

12: update s accordingly; 

13: end if 

14: end for 

15: return sCSas proposed jury 



Note that we assume that Algorithm 2 is called to calculate 
JER. 

The time cost on Line 11 is 0(N ■ log A) according to 
Lemma 1, and there are in total N times of iterations. The 
sorting in Line 2 costs 0(N ■ log A') time, and comparison 
in Line 16 costs O(l) time. Hence the algorithm for JSP 
on AltrM model has time complexity of 0(N ■ log N ■ N) — 
0(A 2 -logA). 

The algorithm for JSP on AltrM is denoted as AltrALG 
for simplicity. 

3.3 JSPonPayM 

In PayM model, each candidate juror is associated with 
a requirement fi, and the Jury Selection Problem is about 
how to select the best jurors within a limited budget. In 
such a setting, a candidate Jury may be rejected because of 
excessive requirement of payments. And we will discuss how 
to estimate the expected payment of each candidate juror 
in Section 4.2. 

3.3.1 NP-hardness 

Compared to a traditional 0/1 Knapsack Problem(KP), 
JSP on PayM features JER as an objective function, in- 
stead of a simple summation of values of the selected items. 
Although we have proved in Lemma 3 that the JER is low- 
est when selecting individuals with the lowest error-rates, 
given a fixed size of jury, we do not know its monotonicity 
with respect to the size of a selected jury. These proper- 
ties make the objective function a generally non-linear one, 
which shows much more hardness than the general Knapsack 
Problem. The general 0/1 Knapsack Problem is a classic 
NP-complete problem [14], and we reduce one of its vari- 
ant, the nth-order Knapsack Problem(nOKP), to the JSP 
problem. 

Lemma 4. JSP on PayM is NP-complete. 

Proof (Sketch of Proof of Lemma 4). We reach the 
proof of Lemma 4 by proving the np-completeness of its de- 
cision version, the Decision JSP(DJSP), i.e. given a JSP 
instance and a value v, decide whether a Jury J n can be 
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selected so that JER(J n ) is equal to v. According to Def- 
inition 6, which is the objective function of JSP, this op- 
timization problem is a nth-order Knapsack Problem. We 
then follow the proof of NP-hardness of Quadratic Knapsack 
Problem(QKP)given by H. Kellerer, et al. in [14] to prove 
the hardness of nOKP. 

A nth-order Knapsack Problem(nOKP) is a Knapsack 
problem whose objective function has the form as follow: 

optimize . . . V[ii, 12, ... , i n ] ■ X\X2 ■ ■ ■ x n 

s V ' 

n 

where V[ii, 12, ... , i n ] is a n-dimensional vector indicating 
the profit achieved if item [ii,i„,. . . , i„] are selected simul- 
taneously. 

Given an instance of traditional KP, we can construct an 
nOKP instance by defining the profit n-dimensional vector 
as V[i, i, . . . ,i] = pi and V[otherwise] = for all i, where 
Pi is the profit in traditional KP. The weight vector and 
objective value remain the same. □ 

3.3.2 Approximate Algorithm 

Because of the complexity of JSP on PayM, we present a 
heuristic algorithm to tackle this problem with best efforts. 
The underlying idea of Greedy Heuristic is to sort all the 
candidate jurors according to the product of their error rate 
and requirement, i.e. ti ■ Ti. Then we increase the size of a 
jury from 1 to iV with a growing pace of 2. Each time when 
the enlargement still comply with the budget constraint, we 
allow this enlargement after validation of improvement on 
JER. The difference between this algorithm and the tradi- 
tional greedy algorithm for 0/1 Knapsack Problem is that 
when the algorithm considers a new candidate, not only the 
weight, but also the benefit should be compared. This is 
also the reason why JSP on JER is harder than traditional 
KP. Formally we present the Greedy Heuristic Algorithm in 
Algorithm 4: 



Algorithm 4 Framework of JSP on PayM - (PayALG) 
Input: A set of N candidate jurors S with the vector of 
individual error-rates e and the vector of requirements 
r, and a non-negative budget B 
Output: A subset of candidate juror set sCS 
1: sort ej • Ti in ascending order into j — {ji,ji, ■ ■ ■ , Jjv}; 
2: f :— 0, s := 0, pair := 0; 
3: while n > B do 

4: increase i in j; / /find the first ji in j, s.t. n < B; 

5: end while 

6: select ji, s := s U {ji} ; 

7: update accumulated requirements := n; 

8: for m = i + 1 : N do 

9: if pair = and r m +r < B then 

10. Set j m aS pair, jpair • — jm,rpair — 1 m, 

11: set pair flag pair := 1; 

12: else if pair = 1 and r m + r pa i r + r < B and JER(sU 

{jpair, jm}) < J E R(s) then 

13: select j m and its pair, s := s U {jpair, jm}; 

14: set pair flag pair := 0; 

15: end if 

16: end for 

17: return sCSas the proposed jury 



Note that in Line 1, r is the current accumulated require- 
ment. Due to the requirement of odd size of a jury, the 
greedy algorithm considers a pair of candidate jurors as one 
enlargement. Then in Line 6, we find the first feasible juror 
whose requirement is less budget B. And in each step, a 
pair flag will be set (in Line 10) to indicate that one more 
candidate should be admitted to examine the updated JER. 

The algorithm for JSP on PayM is denoted as PayALG 
for simplicity. 

4. PARAMETER ESTIMATION 

In this section we will further discuss several possible ap- 
proaches to estimate the individual error-rate e; and expect- 
ed payment requirement n in PayM model from micro-blog 
service data. 

Note that to obtain person's individual error-rate and ex- 
pected cost is itself an emerging research topic, in the tide of 
extending power of crowdsourcing from AMT to more gen- 
eral platforms. Our work focuses on forming up best crowd 
and aggregating answers, which is a fundamental step of this 
trend. For complete illustration of the proposed frame work, 
we propose a method to infer the requirement from the age 
of an account. In fact, any other reasonable measures can 
be smoothly plugged in to our framework. 

4.1 Estimate Individual Error-rate 

In this subsection, we propose a possible method to es- 
timate error-rates according to their authority in terms of 
knowledge where decision is made. Basically, our approach 
is to construct a user-graph for Twitter data according to 
their forwarding operation retweet "RT" , and ranking users 
in the constructed graph. Each user is then assigned a rank- 
ing score, or confidence score, which represents the quality 
of the user and can be directly translated to an error-rate. 
Details are explained as follows. 

4.1.1 Graph Construction 

The Twitter social network is modeled as a graph G(V, E), 
where V is the set of nodes, each of which represents a us- 
er, and E is the set of edges. Instead of making use of 
the "following-and-follower" user relationship on Twitter, 
we link two nodes or users based on their retweet action- 
s. A retweet action is that a user quotes or re-broadcasts 
another user's tweet. Intuitively, the more a user's tweets 
are retweeted by other users, the more authoritative or in- 
fluential the user is. Previous work [5] has adopted retweet 
measurement for influence analysis on Twitter. It is also 
indicated that mainstream news organizations and celebri- 
ties are the major groups of people who often induce a high 
level of retweet actions. Therefore, by building a retweet- 
relationship-based graph and ranking users in the graph, we 
can identify reliability or quality of users to a large degree. 

More specifically, we define an ordered-pair of users (useri, 
user2) if user\ has ever retweeted user's tweets, which 
we call a retweet-relationship pair. In our Twitter data, 
a tweet containing "RT ©username" indicates that user- 
name's tweet is retweeted, where username is any legal 
username on Twitter. There are two possible cases that 
suggest the existence of a retweet-relationship pair in our 
Twitter data: 

1. A tweet t released by a user useri contains one and 
only one substring "RT ©username" 
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2. A tweet t released by a user useri contains more than 
one substrings "RT ©username" 

where username is any legal username. 

In the first case, let user?, be the user with username in 
the substring "RT ©username" , and then (useri, user?) is 
a retweet-relationship pair. In the second case, let user?, 
user 3, ... userN be the users whose usernames are con- 
tained in the substrings "RT ©username" and that ap- 
pear in the order appearing in the tweet t. As a proto- 
type, this t indicates a retweet-relationship chain: userN is 
the original author, userN-i retweetes wserjv's tweet, 
and user\ retweets user?'s tweets, which is released as t. 
For this retweet-relationship chain, we extract N-l retweet- 
relationship pairs (useri, user?), (user?, users), and 
(wserjv-i, userN). 

In the set of retweet-relationship pairs we find in our Twit- 
ter data, we link useri to user? once and only once for each 
pair (useri, user?), which results in a directed user-graph. 



Algorithm 5 Graph Construction 

Input: Tweets dataset T. Each record r(t, author) in- 
cludes the tweet's content c and author author 
Output: A directed graph G(V, E) 

1: Set V = 

2: Set E = 

3: for each r(t, author) € T do 
4: lastjuser = author 
5: Add lastjuser to V 

6: while c contains substring str— 'RT @[\w] + [\W] + ' 
do 

7: Extract username user_refweeied='[\w]+' from str 

8: Add user jretweeted to V 

9: Add edge(lastjaser — > user jretweeted) to E 
10: Delete str from t 
11: lastjuser — user jretweeted 
12: end while 
13: end for 
14: return G(V,E); 



4.1.2 User Ranking 

In order to measure quality of users, we need to rank 
users in the graph constructed. Popular webpage ranking 
algorithms HITS [15] and PageRank [22] have been applied 
to solve expert location problems in social networks [29]. 
Since our constructed graph is also a directed and connect- 
ed user network that is suitable to run graph-base ranking 
algorithms, we also employ HITS and PageRank on the user- 
graph to obtain quality or confidence scores of users. 

We can obtain authority scores and hub scores for users by 
employing HITS. We adopt the authority scores as quality 
scores. The page rank scores calculated by PageRank are 
directly used as quality scores. 

We generalize the framework of estimating user scores by 
HITS in Algorithm 6 and by Pagerank in Algorithm 7. We 
find in the real dataset that most top ranking users discov- 
ered by Pagerank overlaps with the ones identified by HITS. 

4.1.3 Error-rate 

Due to the Power law distribution characteristics of social 
network users, and also for the ease of differentiating the 
qualities among all candidate users, we normalize the score 



Algorithm 6 Quality Score Calculation with HITS 

Input: A directed graph G(V, E) 

Output: Quality scores Score for each user € V 

1: Initialize Score and Hub to 1 

2: while iteration not ends do 

3: Reset Score to 

4: for each edge(u,v) £ E do 

5: Score[v] = Score[v] + Hub[u] 

6: end for 

7: Normalize Score 

8: Reset Hub to 

9: for each edge(u,v) £ E do 
10: Hub[u\ = Hub[u] + Score[v] 
11: end for 
12: Normalize Hub 
13: end while 
14: return Score; 



Algorithm 7 Quality Score Calculation with PageRank 

Input: A directed graph G(V, E) 

Output: Quality scores Score for each user £ V 

1: Set damping factor d 

2: n = \V\ 

3: for each user £ V do 

4: Score[user] = i 

5: Out[user] = \{v\edge(user, v) € E}\ 

6: InJSet[user] = {u\edge(u,user) € E} 

7: end for 

8: while iteration not ends do 
9: for each user £ V do 
10: New_Score[user] = ^ + d £ 

i£/n_£> et [user] 

11: end for 

12: Copy New-Score to Score 
13: end while 
14: return Score; 



of each user to range in (0, 1) as follows, where a and /3 are 
normalization factors (setting are given in Section 5.2): 

o — ct(score i —min)/(max — min) 

C% — p 

where min and max are the minimum and maximum score 
values obtained from Algorithm 6 and Algorithm 7. 

4.2 Integrated Cost Estimate in PayM Model 

There are several works related to user profiling and com- 
munity inferring on social networks [30], from all kinds of 
information like online behaviors [20] or even user names 
[28]. Based on these attributes of users, we can imply the 
taste and preference of users [17]. 

The task of further determining the individual require- 
ment for each user is a domain-specified procedure, and 
needs careful designs according to different types of tasks 
to be proposed. The detail of such mechanism is out of the 
scope of this work, and we propose an optional indicator to 
estimate the individual requirement r;. 

Inferring from Account Age 

Here we propose to use a single attribute as the indicator 
of individual requirement: the age of a user account since 
registration. We roughly assume that, the more experienced 
a user is, the less he or she will be interested in a task. 
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Figure 2: System Overview 



In Section 4.1.2, after selecting a candidate set 5* accord- 
ing to the individual error rate, we retrieve the age i; of 
each user from his or her registration date. The value of 
individual requirement can be estimated as follow: 

(ti — min) 

Ti = - i- 

max — mm 

where min and max are the extremum values of the esti- 
mated account age of all users. 

4.3 System Overview 

Although this study mainly focus on the modeling and 
algorithmic solution of the practical jury selection problem, 
we briefly present a conceptual system overview for better 
illustration in Figure 2. 

There are mainly two parts in the system, one is for esti- 
mating individual error rate and requirement for a large set 
of candidates, and the other is concerning about selecting 
the best crowd. As illustrated in the upper part of Fig- 
ure 2, a subset of candidates are selected to form a jury, and 
this jury will achieve a final Yes/No decision via Majority 
Voting scheme. For different situations, different parameter 
estimation methods should be utilized to best capture the 
candidates' characteristics. 

5. EVALUATION 

In this section, we present our experimental evaluation of 
the performance of AltrALG and PayALG, as well as an 
experimental study of the JSP problem, namely the rela- 
tionship among individual error rates, the optimal jury size 
and given budgets. 

In Section 5.1, we utilize synthetic datasets to evaluate 
the performance of both algorithms, which follow the normal 
distributions with varying mean values and variance values. 
In Section 5.2, we retrieve candidate juror data from real 
micro-blog service data(Twitter) by following the algorithms 
described in Section 4. 

All the experiments are conducted on an Intel(R) Core(TM) 
i7 3.40GHz PC with 8GB memory, running on Microsoft 
Windows 7. 

5.1 Synthetic Data 

To simulate individual error rates and requirements with- 
out bias, in this section we produce synthetic datasets fol- 
lowing normal distributions with varying mean values and 
variance values. JSP characteristics are investigated both 



on AltrM and PayM models. Then we evaluate the effi- 
ciency of AltrALG on AltrM model and the effectiveness of 
PayALG on PayM model. 

5.1.1 Evaluation on AltrM 
JSP Traits on AltrM 

AltrM model can actually be interpreted as one special 
case in PayM where all the requirements are zero or a un- 
limited budget B is given. In such case, the only concern of 
JSP is to determine the size of a jury whose JER is mini- 
mized. 

The synthetic dataset is generated as following: we gen- 
erate 1,000 candidate jurors, whose individual error rates 
follow a normal distribution with mean values varying from 
0.1 to 0.9, and variance values from 0.1 to 0.3. We then per- 
form AltrALG on this dataset and record the performance 
as shown in Figure 3(a). In this figure, var means variance 
of the individual error rates. 

It is straightforward to interpret our findings: when most 
of the candidates are reliable, namely whose individual error 
rates are less than 0.5, the optimization problem is conduct- 
ed as searching in a very flat slope. This causes a randomized 
distribution of a best jury size as shown in the left shoulder 
of curves in Figure 3(a). On the other hand, when most of 
the individuals are error-prone, which means individual er- 
ror rate is larger than 0.5, a best jury has to reduce its size 
to keep the jury as "the hands of the few" 1 . In addition, 
under this synthetic dataset, the threshold of reducing the 
jury size is around the point where the mean of individual 
error rates is 0.5. This actually implies the turning point 
where wisdom of crowd may malfunction. 

Efficiency on AltrM 

Because the AltrALG can always find the optimal solu- 
tion, we then mainly evaluate its efficiency with a growing 
data size. Specifically, we track the running time of AltrAL- 
G with an increasing input size. In this setting, we gener- 
ate dataset of individual error-rates with mean value of 0.1, 
and vary the size of candidate jurors from 2,000 to 6,000 
with variance of 0.05 and 0.1 respectively. The results are 
shown in Figure 3(b). The line denoted by m(0.1) means the 
dataset is with variance of 0.1 and the algorithm is conduct- 
ed without lower-bounding checking in Line 6 of AltrALG; 
the line with legend of m(0.1, b) means the lower-bounding 
checking is conducted normally as in AltrALG. 



x The case where "truth rests in the hands of a few." 
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Figure 3: Experiments Results 



It can be noticed in Figure 3(b) that when the data size 
is small(2000 to 3000), the enhancement of algorithm by 
checking lower bounding entails unfortunately more running 
time than the non-enhanced one, which is mainly caused by 
the overhead of checking the condition of lower- bounding 
pruning. When the size grows, we may easily generalize that 
the running time of the enhanced algorithm increases slower 
than the non-enhanced one with a ratio of 0(1/ log N). 

5.1.2 Evaluation on Pay M 

Since the greedy heuristic algorithm on PayM runs with 
a linear time cost, instead of focusing on the efficiency issue, 
in this subsection we investigate the relationship between 
the quality of a selected jury and the given budget. More- 
over, we evaluate the quality of our selection algorithm by 
comparing JER and the total cost with ground truth. 

JSP Traits on PayM 

JSP on PayM is a classic situation where most crowd- 
sourcing applications are conducted, and the influence of 
budgeting is one of the essential factors in this setting. Thus, 
we investigate the relationship among the budget posed and 
the resulted JER, as well as the final cost. 

We generate a candidate jurors set with individual error 
rate mean of 0.2, variance of 0.05 and set size of l,000;the 
individual requirement is generated from the normal distri- 
bution with mean value of 0.4, 0.5 and 0.6 respectively , 
variance value of 0.2. The given budget B varies from 0.1 to 
0.5 and the results are shown in Figure 3(c) and Figure 3(d). 
The line with m(0.3) as legend represents the performance 
of jurors with mean error-rate of 0.3. 

The results in Figure 3(c) again verifies the findings in 
Section 5.1.1 that for jurors with individual error rate of 
more than 0.5, the algorithm tends to reduce the size of the 



selected jury but pay higher for each selected juror. From 
Figure 3(d), we can generalize that a raising budget can 
improve jury quality by reducing JER, and a candidate set 
with lower individual error-rates(e.g. the one of m(0.3)) 
forms a better jury within same budget. 
Effectiveness on PayM 

By terming "Effectiveness" , we are to investigate the dis- 
crepancy between the results obtained by PayALG and the 
ground truth. Due to the NP-hardness of JSP on PayM, 
we calculate the ground truth via enumerating all possi- 
ble combinations of jurors and check whether a combination 
achieves the lowest JER while satisfying the budget require- 
ment. Since the running time increases exponentially with a 
growing size of candidates in this enumeration method, we 
generate a candidate jurors set with size of only 22. The 
error-rates of these candidates follows the normal distribu- 
tions with mean of 0.2 and variance of 0.05 and 0.1 respec- 
tively; the individual requirement is also following a normal 
distribution with mean of 0.05 and variance of 0.2. We vary 
the budget from 1 to 3 with step of 0.2, and the results are 
shown in Figure 3(e) and Figure 3(f). In the legend, "AP- 
PX" represents the results from Algorithm 4, and "OPT" 
represents the ones from ground truth. 

The results from ground truth in Figure 3(e) show that 
budget is indeed the constraint of forming better jury. In 
Figure 3(f), its shows that the heuristic PayALG achieves 
the optimal JER as ground truth 4 times out of 11. More- 
over, the biggest discrepancy appears with the lowest budget 
B — 0.5, and with an increasing budget, the JER given by 
PayALG is getting closer to the one of ground truth, be- 
cause a larger budget loosens the constraint of forming a 
better though sub-optimal jury. 
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5.2 Real Twitter Data 

The dataset we use in this section is a previously pub- 
lished collection of public twitter time-line messages, record- 
ing random samples gathered in two days. We estimate in- 
dividual error rate e, and ri from the data based on the 
methods in Section 4, which use HITS and PageRank to es- 
timate the error-rate of each users. The normalization of in- 
dividual error-rates follows equation in Section 4.1.3. There 
are in total 689,050 nodes but since most of them have very 
sparse mutual 'RT' relationship, so we simply choose the 
5,000 users with highest scores. 

5.2. 1 Evaluation on AltrM 

For online services, efficiency is an important issue to be 
considered. To evaluate whether the proposed algorithm is 
a practical technique, we test it on both datasets generated 
from HITS and PageRank. Both the datasets are normalized 
according to the equation in Section 4.1.3 with parameter 
a = 10, /3 = 10. We evaluate the running time by varying 
size of candidate jurors from 1,000 to 5,000. The results are 
shown in Figure 3(g). In the legend, "HT" stands for data 
from HITS and "PR" stands for data from PageRank, and 
"-B" stands for the results achieved with the lower-bounding 
enhancement in Line 6, AltrALG. 

As shown in Figure 3(g), the running time of the algo- 
rithm without bounding enhancement on two datasets is 
almost the same. But with bounding enhancement, the run- 
ning time of the algorithm on data set generated by PageR- 
ank(PageRank data in short) is largely reduced while that 
of HITS increases. This is due to the difference between the 
two datasets, that after normalization, a larger portion of 
users in RageRank data has error-rates close to extremes(0 
or 1) than the ones in HITS do. This distribution makes 
more users in the PageRank data satisfy the condition of 
using bounding enhancement: 7 G (0, 1) according to Lem- 
ma 2 and thus avoid the unnecessary calculation of JER. 
But for users in HITS, the overhead of checking condition 
for lower bounding entails even more time cost. 

5.2.2 Evaluation on PayM 

We evaluate the performance of the approximation algo- 
rithm on both HITS and PageRank datasets. Due to the 
power law distribution of online user's error-rates, the size 
of the best jury converges quickly and the values of JER is 
thus reduced to 0. Thus in this subsection, we focus to pro- 
viding a precision and recall value of the approximation al- 
gorithm. As previously mentioned, the ground truth comes 
from enumeration of all possible combinations and thus en- 
tails exponentially increasing time cost. Thus we retrieve 
top 20 candidates via both HITS and PageRank algorithm, 
and their error rates are normalized according to the equa- 
tion in Section 4.1.3 with a = 10 and j3 = 10. To provide 
meaningful budget testing variables, we vary the budget B 
as 0.1%M, 1%M, 10%M and 20%M, where M is the aver- 
age value of estimated requirement of all candidate users 
multiplied by the number of candidate jurors. We present 
the precision and recall values in Figure 3(h). In the legend, 
"-Prec" stands for precision values and "-Rec" stands for 
recall values. 

It can be seen in Figure 3(h) that results from HITS data 
have precision and recall with 1, but the results from PageR- 
ank have lower resemblance with ground truth in terms of 
precision and recall. As we discussed in Section 5.2.1, there 



are a relatively larger number of jurors in PageRank who 
have low error-rates than the ones in HITS, and this broad- 
ens the feasible solution space for forming a jury and in turn 
brings forward the low precision and recall values. In addi- 
tion, as shown in Figure 3(i), the size of jury formed on 
PageRank data is close to the one from ground truth; and 
the size of jury formed on HITS always identical to ground 
truth. However, despite such low precision and recall value, 
the JER given by PayALG is still low enough(0.00075) for 
as a credible jury. 

6. RELATED WORK 

Crowdsourcing Research on crowdsourcing overlaps with 
several other topics like social computing, human comput- 
ing, collective/collaborative intelligence, etc. It provides a 
new problem-solving paradigm[2, 18] and has branched into 
several areas. In database community, new types of queries 
are developed to aggregate distributed knowledge. [19] pro- 
poses "Qurk" to manage crowdsourced tasks as in relation- 
al database. [10] propose "CrowdDB" to organize human 
intelligence to solve problems that are naturally hard for 
computers. 

[23] considers the situation where humans are invited to 
enhance a graph search procedure, and proposes an algo- 
rithm to find the optimal target nodes for crowd participa- 
tion. Thus the work in [23] has certain resemblance with 
the problem studied in the sense of improving crowdsourc- 
ing performance. Other works have also provided creative 
usages of wisdom of crowd in multimedia annotation [6] and 
document searching [1]. 

Worker Quality As illustrated in the previous sections, 
crowdsourcing applications succeed when enough problem- 
solvers are well organized and their efforts are harvested 
intelligently. However, the overall quality of all individual 
workers is also important for the quality of final output. A 
well endorsed work by I., Panagiotis [13] proposes to use soft 
labels to improve the quality of a task finished by crowds, 
for that soft label can differentiate spam workers and bias 
workers. The work in [25] provides a Bayesian model to use 
maximum likelihood for inferring error rates of crowdsourc- 
ing workers. 

In the application of data sourcing from the crowd, [7] 
proposes to use Markov Chain Monte Carlo to estimate the 
error rate of the data from crowdsourcing activities. Hence, 
work related to data cleaning can also be considered to rec- 
oncile the confliction within a crowd [11]. 

In this paper, we discuss the relationship between individ- 
ual worker quality and the quality of one special product, 
the reliability of voting. And one major difference is that on 
AMT task requesters cannot choose worker actively; how- 
ever, to ask question on micro-blog service is intrinsically 
equipped with the "@" markup, which enables the selection 
of workers. 

Expert Team Formation Another application resem- 
bles the problem of finding a suitable set of users is Expert 
Team Formation Problem [16]. In Expert Team Formation 
Problem, the task normally has a specific requirement for 
certain skills which are possessed by different candidate ex- 
perts. At the same time, cost of choosing one expert is also 
defined, e.g. communication cost or influence on personal 
relationship etc. Therefore the Team Formation problem is 
to minimize the cost while fulfilling the skills requirement. 
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Besides using explicit graph constraints, Expert Team For- 
mation problem based on communication activities is also 
studied. In [3, 8], emails communication is used for exper- 
tise identification. 

7. CONCLUSION 

In this paper, we study the Jury Selection Problem(JSP) 
for decision making tasks on micro-blog services, whose chal- 
lenges are calculating JER and finding the optimal subset 
under a limited budget. We explicitly discuss the formation 
of such probability and propose two efficient algorithms to 
calculate it within 0(n 2 ) and 0(n ■ logn) time respectively. 

Models of altruistic users(AltrM) and of incentive-requiring 
users(PayM) are proposed to capture characteristics of crowd- 
sourcing applications. The AltrM model features the mono- 
tonicity of JER on individual error rate, and JSP on AltrM 
model is NP-hard. We propose an efficient algorithm for 
JSP on both models. 

We verified the proposed algorithms on both synthetic and 
real datasets through extensive experiments. 
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